# Using descriptive statistics

Statistics describe important aspects of our data, often revealing deeper insights.

Statistics is a branch of mathematics concerned with data collection, analysis, interpretation, presentation, and organization. 

It plays a crucial role in various fields, from business and economics to healthcare and social sciences. Using statistical techniques, we can describe essential aspects of our data and uncover patterns and trends that may not be immediately apparent. Statistics can help us make informed decisions, identify potential problems, and evaluate the effectiveness of interventions.

In short, statistics can reveal more profound insights into our data and provide valuable information that can guide us in making better decisions.

## How To

In [1]:
import pandas as pd
df = pd.read_csv("data/housing.csv")
df.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,ocean_proximity
0,-122.23,37.88,41.0,880.0,129.0,322.0,126.0,8.3252,452600.0,NEAR BAY
1,-122.22,37.86,21.0,7099.0,1106.0,2401.0,1138.0,8.3014,358500.0,NEAR BAY
2,-122.24,37.85,52.0,1467.0,190.0,496.0,177.0,7.2574,352100.0,NEAR BAY
3,-122.25,37.85,52.0,1274.0,235.0,558.0,219.0,5.6431,341300.0,NEAR BAY
4,-122.25,37.85,52.0,1627.0,280.0,565.0,259.0,3.8462,342200.0,NEAR BAY


In [4]:
df.describe()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
count,20640.0,20640.0,20640.0,20640.0,20433.0,20640.0,20640.0,20640.0,20640.0
mean,-119.569704,35.631861,28.639486,2635.763081,537.870553,1425.476744,499.53968,3.870671,206855.816909
std,2.003532,2.135952,12.585558,2181.615252,421.38507,1132.462122,382.329753,1.899822,115395.615874
min,-124.35,32.54,1.0,2.0,1.0,3.0,1.0,0.4999,14999.0
25%,-121.8,33.93,18.0,1447.75,296.0,787.0,280.0,2.5634,119600.0
50%,-118.49,34.26,29.0,2127.0,435.0,1166.0,409.0,3.5348,179700.0
75%,-118.01,37.71,37.0,3148.0,647.0,1725.0,605.0,4.74325,264725.0
max,-114.31,41.95,52.0,39320.0,6445.0,35682.0,6082.0,15.0001,500001.0


In [6]:
df.groupby("ocean_proximity").median()

Unnamed: 0_level_0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
ocean_proximity,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
<1H OCEAN,-118.275,34.03,30.0,2108.0,438.0,1247.0,421.0,3.875,214850.0
INLAND,-120.0,36.97,23.0,2131.0,423.0,1124.0,385.0,2.9877,108500.0
ISLAND,-118.32,33.34,52.0,1675.0,512.0,733.0,288.0,2.7361,414700.0
NEAR BAY,-122.25,37.79,39.0,2083.0,423.0,1033.5,406.0,3.81865,233800.0
NEAR OCEAN,-118.26,33.79,29.0,2195.0,464.0,1136.5,429.0,3.64705,229450.0


In [7]:
df.agg({"longitude": ["min", "max", "mean"],
        "latitude": ["min", "max", "mean"],
        "total_rooms": ["min", "max", "median"],
        "median_income": ["skew"]})

Unnamed: 0,longitude,latitude,total_rooms,median_income
max,-114.31,41.95,39320.0,
mean,-119.569704,35.631861,,
median,,,2127.0,
min,-124.35,32.54,2.0,
skew,,,,1.646657


In [11]:
df["ocean_proximity"].value_counts()

<1H OCEAN     9136
INLAND        6551
NEAR OCEAN    2658
NEAR BAY      2290
ISLAND           5
Name: ocean_proximity, dtype: int64

In [13]:
df.corr('spearman')

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
longitude,1.0,-0.879203,-0.150752,0.04012,0.063879,0.123527,0.06002,-0.009928,-0.069667
latitude,-0.879203,1.0,0.03244,-0.018435,-0.056636,-0.123626,-0.074299,-0.088029,-0.165739
housing_median_age,-0.150752,0.03244,1.0,-0.357162,-0.306544,-0.283879,-0.281989,-0.147308,0.074855
total_rooms,0.04012,-0.018435,-0.357162,1.0,0.915021,0.816185,0.906734,0.271321,0.205952
total_bedrooms,0.063879,-0.056636,-0.306544,0.915021,1.0,0.870937,0.975627,-0.006196,0.086259
population,0.123527,-0.123626,-0.283879,0.816185,0.870937,1.0,0.903872,0.006268,0.003839
households,0.06002,-0.074299,-0.281989,0.906734,0.975627,0.903872,1.0,0.030305,0.112737
median_income,-0.009928,-0.088029,-0.147308,0.271321,-0.006196,0.006268,0.030305,1.0,0.676778
median_house_value,-0.069667,-0.165739,0.074855,0.205952,0.086259,0.003839,0.112737,0.676778,1.0


## Exercise

## Additional Resources

- [Pandas Documentation](https://pandas.pydata.org/docs/getting_started/intro_tutorials/06_calculate_statistics.html)